two-layer neural network
Generalization error bounds for two-layer neural networks with Lipschitz loss function
Nguwi, Jiang Yu, Privault, Nicolas
We derive generalization error bounds for the training of two-layer neural networks without assuming boundedness of the loss function, using Wasserstein distance estimates on the discrepancy between a probability distribution and its associated empirical measure, together with moment bounds for the associated stochastic gradient method. In the case of independent test data, we obtain a dimension-free rate of order $O(n^{-1/2} )$ on the $n$-sample generalization error, whereas without independence assumption, we derive a bound of order $O(n^{-1 / ( d_{\rm in}+d_{\rm out} )} )$, where $d_{\rm in}$, $d_{\rm out}$ denote input and output dimensions. Our bounds and their coefficients can be explicitly computed prior to the training of the model, and are confirmed by numerical simulations.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- (2 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond Taiji Suzuki 1,2, Denny Wu
Langevin dynamics (MFLD) (Mei et al., 2018; Hu et al., 2019) is particularly attractive due to the MFLD arises from a noisy gradient descent update on the parameters, where Gaussian noise is injected to the gradient to encourage "exploration". Furthermore, uniform-in-time estimates of the particle discretization error have also been established (Suzuki et al., The goal of this work is to address the following question.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
ae614c557843b1df326cb29c57225459-Paper.pdf
In this work, we showthat this "lazy training" phenomenon isnot specific tooverparameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding amodel equivalenttolearning withpositive-definite kernels.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
accordingly to incorporate the comments. Reviewer # 1: (Stepsize and preset T.) Following the current analysis, for a general stepsize η
We appreciate the valuable comments and positive feedback from the reviewers. Without averaging the iterates, no convergence rate is available. In this paper we consider neural network with one hidden layer. In particular, Proposition 4.7 shows that neural TD attains the global minimum of MSBE (without the We will revise the "without loss of generality" claim in the revision. We will clarify this notation in the revision.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Switzerland (0.04)